First, load all data in object
source("process_teams.R")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
nba_data <- all_teams_data()
nba_data_1 <- nba_data %>%
mutate_at(vars(-name), as.numeric)
The data has already been processed, each row is a player, with a metric (my ESPN league’s fantasy score) calculated and accumulated over a season.
Here’s a visual:
source("plot.R")
plot <- plot_nba(nba_data_1)
plot
As you can see, most player are concentrated towards the bottom. More work can be done to uncover interesting trends. To start with, given what is already plotted, it can be straightforward to highlight the top n performers, in this case 20.
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
pivoted_nba <- pivot_nba(nba_data_1) %>%
arrange(desc(cumscore)) %>%
distinct(name, .keep_all = TRUE) %>%
head(20) %>%
print()
## # A tibble: 20 Ă— 4
## # Groups: name [20]
## name date score cumscore
## <chr> <date> <dbl> <dbl>
## 1 jokicni01 2023-04-08 22 2689
## 2 sabondo01 2023-04-09 25 2508
## 3 embiijo01 2023-04-06 24 2458
## 4 doncilu01 2023-04-07 12 2276.
## 5 tatumja01 2023-04-07 18 2274.
## 6 gilgesh01 2023-04-06 21.5 2162.
## 7 antetgi01 2023-04-04 31.5 2124.
## 8 vucevni01 2023-04-09 14.5 2074.
## 9 randlju01 2023-03-29 2 2005
## 10 davisan02 2023-04-09 30.5 1922.
## 11 adebaba01 2023-04-09 8.5 1904.
## 12 siakapa01 2023-04-07 22 1884.
## 13 derozde01 2023-04-09 16.5 1848.
## 14 youngtr01 2023-04-07 38 1840.
## 15 edwaran01 2023-04-09 28 1832.
## 16 mobleev01 2023-04-09 6 1814.
## 17 foxde01 2023-04-09 13 1811
## 18 claxtni01 2023-04-07 31.5 1806
## 19 butleji01 2023-04-06 29 1795
## 20 lillada01 2023-03-22 34 1785
plot2 <- pivot_nba(nba_data) %>%
filter(name %in% pivoted_nba$name) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Top 20 scorers", x = "Date", y = "Cumulative Score")
ggplotly(plot2, tooltip = c("y", "group"))
It is now clear that the highest line we saw previously belonged to Nikola Jokic. Many of the NBA’s top performers are here, although context is needed to determine what their names actually are. To match the “usernames” to the real names is another layer of work needed, back in the collection process. Names that might be interesting here are the second to the top purple line (Domantas Sabonis), Nic Claxton who came somewhat out of nowhere in 22-23 season.
games_played_data <- all_games_player()
nba_data_3 <- pivot_nba(nba_data) %>%
filter(date == "2023-04-09" | date == "2023-02-16") %>%
group_by(name) %>%
summarise(total_score = max(cumscore), post_asg_score = max(cumscore) - min(cumscore))
nba_data_4 <- full_join(nba_data_3, games_played_data) %>%
mutate(avg_score = total_score / games_played, post_asg_avg = post_asg_score / after_all_star)
## Joining with `by = join_by(name)`
top_pg_scorers <- nba_data_4 %>%
arrange(desc(avg_score)) %>%
head(20) %>%
select(name, avg_score)
top_pg_scorers
## # A tibble: 20 Ă— 2
## name avg_score
## <chr> <dbl>
## 1 jokicni01 39.0
## 2 embiijo01 37.2
## 3 doncilu01 34.5
## 4 davisan02 34.3
## 5 antetgi01 33.7
## 6 duranke01 32.0
## 7 gilgesh01 31.8
## 8 sabondo01 31.7
## 9 lillada01 30.8
## 10 tatumja01 30.7
## 11 jamesle01 30.6
## 12 curryst01 29.9
## 13 hardeja01 28.1
## 14 butleji01 28.0
## 15 willizi01 27.8
## 16 irvinky01 27.7
## 17 halibty01 27.3
## 18 porzikr01 27.0
## 19 leonaka01 26.6
## 20 markkla01 26.6
plot3 <- pivot_nba(nba_data) %>%
filter(name %in% top_pg_scorers$name) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Top 20 per-game scorers", x = "Date", y = "Cumulative Score")
ggplotly(plot3, tooltip = c("y", "group"))
Stand out names that appear here that weren’t in the previous graph include most obviously Zion Williamson, as well as names like Lillard, Leonard, and Durant. This lines up with the most common reason players are very talented and perform well, but don’t accrue total stats and burn draftees (long term injuries).
top_pg_scorers_asg <- nba_data_4 %>%
arrange(desc(post_asg_score)) %>%
head(20) %>%
select(name, post_asg_score)
top_pg_scorers_asg
## # A tibble: 20 Ă— 2
## name post_asg_score
## <chr> <dbl>
## 1 embiijo01 810
## 2 sabondo01 792.
## 3 davisan02 715
## 4 jokicni01 694.
## 5 butleji01 632
## 6 ingrabr01 582.
## 7 tatumja01 570.
## 8 jacksja02 562.
## 9 bookede01 559
## 10 siakapa01 555
## 11 lavinza01 552
## 12 antetgi01 551
## 13 bridgmi01 548.
## 14 claxtni01 542.
## 15 vucevni01 541
## 16 giddejo01 538
## 17 leonaka01 534.
## 18 foxde01 534.
## 19 lopezbr01 526
## 20 youngtr01 522.
plot4 <- pivot_nba(nba_data) %>%
filter(name %in% top_pg_scorers_asg$name) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Top 20 scorers after the all star break", x = "Date", y = "Cumulative Score")
ggplotly(plot4, tooltip = c("y", "group"))
We sometimes have a notion of “playoff winners”, here we look at players who saved the best for last. Jokic still dominates, but new names include Mikal Bridges, whose mid-season trade to the Nets unlocked a new facet to his game, and Brandon Ingram, who shook off extensive injuries early in the season to finish strongly after the all star break.
top_risers <- nba_data_4 %>%
arrange(desc(post_asg_avg - avg_score)) %>%
head(20) %>%
select(name, avg_score, post_asg_avg)
top_risers
## # A tibble: 20 Ă— 3
## name avg_score post_asg_avg
## <chr> <dbl> <dbl>
## 1 hasleud01 3.07 19.5
## 2 isaacjo01 8.45 18.5
## 3 maledth01 9.58 16.8
## 4 halibty01 27.3 34.2
## 5 lawsoaj01 3.4 10.2
## 6 kesslwa01 19.1 25.7
## 7 pritcpa01 5.34 11.7
## 8 hortota01 11.3 17.6
## 9 theisda01 8.43 14.5
## 10 mamuksa01 8.44 14.4
## 11 colliza01 16.1 21.8
## 12 mcgruro01 6.42 12.1
## 13 willija06 17.5 23.1
## 14 tillmxa01 12.5 18.1
## 15 reaveau01 14.9 20.4
## 16 azubuud01 6.74 12.2
## 17 quickim01 16.1 21.4
## 18 nworajo01 9.37 14.7
## 19 whiteja03 2.09 7.33
## 20 sochaje01 12.8 17.9
plot5 <- pivot_nba(nba_data) %>%
filter(name %in% top_risers$name) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Top 20 most improved after the all star break", x = "Date", y = "Cumulative Score")
ggplotly(plot5, tooltip = c("y", "group"))
Players who made improvements after the all star break include players who went from 0 to something simply because their team started to tank towards the end of the season. More interesting are names like Kessler, Jalen Williams, and Zach Collins, who made big strides, due to general improvement or new team situation.
nba_data_5 <- all_teams_data(TRUE) # Modified to represent categorical
plot_nba(nba_data_5 %>% mutate_at(vars(-name), as.numeric))
A (somewhat crude) attempt to represent categorical value. No prizes for guessing who comes in first.
nba_data_6 <- nba_data_5 %>%
mutate_at(vars(-name), as.numeric)
pivoted_nba_1 <- pivot_nba(nba_data_6) %>%
arrange(desc(cumscore)) %>%
distinct(name, .keep_all = TRUE) %>%
head(20) %>%
print()
## # A tibble: 20 Ă— 4
## # Groups: name [20]
## name date score cumscore
## <chr> <date> <dbl> <dbl>
## 1 jokicni01 2023-04-08 1.29 517.
## 2 embiijo01 2023-04-04 21.1 491.
## 3 gilgesh01 2023-04-04 3.63 481.
## 4 davisan02 2023-04-09 12.3 369.
## 5 butleji01 2023-04-06 3.03 336.
## 6 tatumja01 2023-04-04 0.174 323.
## 7 irvinky01 2023-04-05 9.39 316.
## 8 lillada01 2023-03-22 1.99 290.
## 9 duranke01 2023-04-02 10.4 289.
## 10 curryst01 2023-04-09 5.97 284.
## 11 doncilu01 2023-04-01 16.2 269.
## 12 porzikr01 2023-03-28 16.3 266.
## 13 jacksja02 2023-04-07 13.8 262.
## 14 halibty01 2023-03-25 7.13 257.
## 15 leonaka01 2023-04-08 4.45 252.
## 16 vanvlfr01 2023-04-04 13.1 252.
## 17 mitchdo01 2023-04-04 10.1 246.
## 18 hardeja01 2023-04-04 9.28 217.
## 19 sabondo01 2023-03-27 3.53 205.
## 20 bridgmi01 2023-04-02 4.43 188.
plot6 <- pivot_nba(nba_data_6) %>%
filter(name %in% pivoted_nba_1$name) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Top 20 category performers", x = "Date", y = "Cumulative Score")
ggplotly(plot6, tooltip = c("y", "group"))
Names that jump into the top 20 that couldn’t before include Irving, Curry, VanVleet, and Lillard (who previously missed too many games). This tracks with what we know about the difference in these scoring systems.
pivoted_nba_2 <- pivot_nba(nba_data_6) %>%
arrange(cumscore) %>%
distinct(name, .keep_all = TRUE) %>%
head(20) %>%
print()
## # A tibble: 20 Ă— 4
## # Groups: name [20]
## name date score cumscore
## <chr> <date> <dbl> <dbl>
## 1 iveyja01 2023-04-05 -5.80 -376.
## 2 barrerj01 2023-04-09 -9.00 -368.
## 3 mathube01 2023-04-07 -5.89 -320.
## 4 banchpa01 2023-04-04 -5.96 -296.
## 5 greenja05 2023-04-09 -2.57 -285.
## 6 greenje02 2023-04-08 -10.6 -267.
## 7 brissos01 2023-04-09 -11.2 -248.
## 8 landajo01 2023-04-07 -6.80 -231.
## 9 martike04 2023-04-09 -7.14 -231.
## 10 osmande01 2023-04-09 -9.00 -229.
## 11 westbru01 2023-03-27 -1.40 -226.
## 12 poolejo01 2023-04-09 -0.415 -220.
## 13 monkma01 2023-03-27 -10.6 -219.
## 14 poweldw01 2023-04-02 -8.73 -218.
## 15 grahade01 2023-04-04 -4.72 -218.
## 16 marshna01 2023-04-09 -10.2 -212.
## 17 clarkjo01 2023-03-05 -0.756 -212.
## 18 kuminjo01 2023-04-07 -3.97 -212.
## 19 huntede01 2023-04-09 -5.77 -210.
## 20 drumman01 2023-04-09 -6.85 -209.
plot7 <- pivot_nba(nba_data_6) %>%
filter(name %in% pivoted_nba_2$name) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
labs(title = "Bottom 20 category performers", x = "Date", y = "Cumulative Score")
ggplotly(plot7, tooltip = c("y", "group"))
The bottom category performers are more interesting than bottom points performers because these players actually play. These names are familiar to fans, possibly for reasons that aren’t so nice.
search_and_plot <- function(list) {
plot3 <- pivot_nba(nba_data) %>%
filter(name %in% list) %>%
ggplot(aes(x = date, y = cumscore, color = name, group = name)) +
geom_line() +
theme_classic() +
theme(legend.position = "none") +
labs(title = "Searched Players", x = "Date", y = "Cumulative Score")
ggplotly(plot3, tooltip = c("y", "group"))
}
search_and_plot(c("youngtr01", "willizi01", "bridgmi01", "poolejo01", "willija06"))
The last component is for the user to input the names themselves, which unfortunately requires them to know what their bball ref name is. Luckily, it follows a straightforward formula: first five letters of last name, plus first two letters of first name, plus identifying number. If your letter combination is unique in nba history, that number is probably 01.